9 research outputs found
Semantic-guided predictive modeling and relational learning within industrial knowledge graphs
The ubiquitous availability of data in today’s manufacturing environments, mainly driven by the extended usage of software and built-in sensing capabilities in automation systems, enables companies to embrace more advanced predictive modeling and analysis in order to optimize processes and usage of equipment. While the potential insight gained from such analysis is high, it often remains untapped, since integration and analysis of data silos from different production domains requires high manual effort and is therefore not economic. Addressing these challenges, digital representations of production equipment, so-called digital twins, have emerged leading the way to semantic interoperability across systems in different domains. From a data modeling point of view, digital twins can be seen as industrial knowledge graphs, which are used as semantic backbone of manufacturing software systems and data analytics. Due to the prevalent historically grown and scattered manufacturing software system landscape that is comprising of numerous proprietary information models, data sources are highly heterogeneous. Therefore, there is an increasing need for semi-automatic support in data modeling, enabling end-user engineers to model their domain and maintain a unified semantic knowledge graph across the company. Once the data modeling and integration is done, further challenges arise, since there has been little research on how knowledge graphs can contribute to the simplification and abstraction of statistical analysis and predictive modeling, especially in manufacturing.
In this thesis, new approaches for modeling and maintaining industrial knowledge graphs with focus on the application of statistical models are presented. First, concerning data modeling, we discuss requirements from several existing standard information models and analytic use cases in the manufacturing and automation system domains and derive a fragment of the OWL 2 language that is expressive enough to cover the required semantics for a broad range of use cases. The prototypical implementation enables domain end-users, i.e. engineers, to extend the basis ontology model with intuitive semantics. Furthermore it supports efficient reasoning and constraint checking via translation to rule-based representations. Based on these models, we propose an architecture for the end-user facilitated application of statistical models using ontological concepts and ontology-based data access paradigms.
In addition to that we present an approach for domain knowledge-driven preparation of predictive models in terms of feature selection and show how schema-level reasoning in the OWL 2 language can be employed for this task within knowledge graphs of industrial automation systems. A production cycle time prediction model in an example application scenario serves as a proof of concept and demonstrates that axiomatized domain knowledge about features can give competitive performance compared to purely data-driven ones. In the case of high-dimensional data with small sample size, we show that graph kernels of domain ontologies can provide additional information on the degree of variable
dependence. Furthermore, a special application of feature selection in graph-structured data is presented and we develop a method that allows to incorporate domain constraints derived from meta-paths in knowledge graphs in a branch-and-bound pattern enumeration algorithm.
Lastly, we discuss maintenance of facts in large-scale industrial knowledge graphs focused on latent variable models for the automated population and completion of missing facts. State-of-the art approaches can not deal with time-series data in form of events that naturally occur in industrial applications. Therefore we present an extension of learning knowledge graph embeddings in conjunction with data in form of event logs. Finally, we design several use case scenarios of missing information and evaluate our embedding approach on data coming from a real-world factory environment.
We draw the conclusion that industrial knowledge graphs are a powerful tool that can be used by end-users in the manufacturing domain for data modeling and model validation.
They are especially suitable in terms of the facilitated application of statistical models in conjunction with background domain knowledge by providing information about features upfront. Furthermore, relational learning approaches showed great potential to semi-automatically infer missing facts and provide recommendations to production operators on how to keep stored facts in synch with the real world
Active Learning with Tabular Language Models
Despite recent advancements in tabular language model research, real-world
applications are still challenging. In industry, there is an abundance of
tables found in spreadsheets, but acquisition of substantial amounts of labels
is expensive, since only experts can annotate the often highly technical and
domain-specific tables. Active learning could potentially reduce labeling
costs, however, so far there are no works related to active learning in
conjunction with tabular language models. In this paper we investigate
different acquisition functions in a real-world industrial tabular language
model use case for sub-cell named entity recognition. Our results show that
cell-level acquisition functions with built-in diversity can significantly
reduce the labeling effort, while enforced table diversity is detrimental. We
further see open fundamental questions concerning computational efficiency and
the perspective of human annotators.Comment: 8 page
Adversarial Attacks on Tables with Entity Swap
The capabilities of large language models (LLMs) have been successfully
applied in the context of table representation learning. The recently proposed
tabular language models have reported state-of-the-art results across various
tasks for table interpretation. However, a closer look into the datasets
commonly used for evaluation reveals an entity leakage from the train set into
the test set. Motivated by this observation, we explore adversarial attacks
that represent a more realistic inference setup. Adversarial attacks on text
have been shown to greatly affect the performance of LLMs, but currently, there
are no attacks targeting tabular language models. In this paper, we propose an
evasive entity-swap attack for the column type annotation (CTA) task. Our CTA
attack is the first black-box attack on tables, where we employ a
similarity-based sampling strategy to generate adversarial examples. The
experimental results show that the proposed attack generates up to a 70% drop
in performance.Comment: Accepted at TaDA workshop at VLDB 202
Reasoning on Knowledge Graphs with Debate Dynamics
We propose a novel method for automatic reasoning on knowledge graphs based
on debate dynamics. The main idea is to frame the task of triple classification
as a debate game between two reinforcement learning agents which extract
arguments -- paths in the knowledge graph -- with the goal to promote the fact
being true (thesis) or the fact being false (antithesis), respectively. Based
on these arguments, a binary classifier, called the judge, decides whether the
fact is true or false. The two agents can be considered as sparse, adversarial
feature generators that present interpretable evidence for either the thesis or
the antithesis. In contrast to other black-box methods, the arguments allow
users to get an understanding of the decision of the judge. Since the focus of
this work is to create an explainable method that maintains a competitive
predictive accuracy, we benchmark our method on the triple classification and
link prediction task. Thereby, we find that our method outperforms several
baselines on the benchmark datasets FB15k-237, WN18RR, and Hetionet. We also
conduct a survey and find that the extracted arguments are informative for
users.Comment: AAAI-202
Shape encoding for semantic healing of design models and knowledge transfer to scan-to-BIM
Automated parsing of design data will increasingly be a prerequisite for efficient data- and analytics-driven management of building portfolios. The high complexity and low rigidity of building information modelling (BIM) model exchange standards such as Industry Foundation Classes result in considerable differences in data quality and impede direct data availability for analytics-based decision support. Mis- or unclassified building elements are a common issue and can lead to tedious manual reworks. At the same time, scan-to-BIM processes still require considerable manual effort to identify subclass element geometry. This work leverages the benefits of a three-dimensional lightweight, geometric algorithm to generate meaningful geometric features autonomously that assist shape classification in erroneous design models and pre-segmented point clouds. Geometric deep learning is introduced in two steps; a discussion about the benefits of graph convolutional networks (GCNs) is given before a set of experiments on BIM element data sets is conducted. Utilising explainable artificial intelligence methods, the GCN performance is made suitable for human-algorithm interaction. Leveraging element geometry solely, the classification reaches a promising average performance of above 83% for the model-healing task with a reduced computation time. The encoded geometric knowledge from the design models is shown to be helpful in showcasing examples of segment classification in point clouds.ISSN:2397-875